C/C++ Users Group Library 1996 July

home *** CD-ROM | disk | FTP | other *** search

/ C/C++ Users Group Library 1996 July / C-C++ Users Group Library July 1996.iso / vol_100 / 129_01 / hack.doc < prev next >

Wrap

Text File | 1985-03-09 | 22.8 KB | 500 lines

/************************************************************************/ /* hack.doc */ /************************************************************************/ /************************************************************************/ /* History */ /* */ /* 82Dec07 CrT Completed. */ /* 82Nov23 CrT Created. */ /************************************************************************/ /************************************************************************/ /* Audience */ /* */ /* Folks trying to read or modify the Citadel code. */ /************************************************************************/ /************************************************************************/ /* Purpose */ /* */ /* Explain the basic data structures and algorithms. */ /************************************************************************/ /************************************************************************/ /* Overview */ /************************************************************************/ The fundamental structure of the system is very simple. CtdlMsg.sys is a circular file. New messages are simply written around the buffer in an endless circle, overwriting old messages in their way. There is no other way of deleting messages, and text is never shuffled on disk. Messages are numbered consecutively and start with an FF (hex) byte. Except for this FF start-of-message byte, all bytes in the message file have the high bit set to 0. This means that in principle it is trivial to scan through the message file and locate message N if it exists, or return error. (Complexities, as usual, crop up when we try for efficiency...) Each room is basically just a list of message numbers. Each time we enter a new message in a room, we slide all the old message-numbers down a slot, and probably the oldest one falls off the bottom. Reading a room is just a matter looking up the messages one by one and printing them out. If the message has been overwritten already, we just ignore it. Implementing the "new message" function is also trivial in principle: we just keep track, for each caller in the userlog, of the highest-numbered message which existed on the >last< call. (Remember, message numbers are simply assigned sequentially each time a message is created. This sequence is global to the entire system, not local within a room.) If we ignore all message-numbers in the room less than this, only new messages will be printed. Voila! Code up the system described thus far, and you'll have a good approximation to Version 1. Better stop and reread everything to here, so you can pick out the fundamental mechanisms among all of Version 2's bells and whistles. /************************************************************************/ /* message format on disk (ctdlMsg.sys) */ /************************************************************************/ Message format has changed relative to V1, in the direction of using more disk space and providing greater flexibility. A message now consists of a sequence of character strings. Each string begins with a type byte indicating the meaning of the string and is ended with a null. All strings are printable ASCII: in particular, all numbers are in ASCII rather than binary. This is for simplicity, both in implementing the system and in implementing other code to work with the system. For instance, a database driven off Citadel archives can do wildcard matching without worrying about unpacking binary data such as dates first. To provide later downward compatability, all software should be written to IGNORE fields not currently defined. /************************************************************************/ /* The type bytes currently defined are: */ /************************************************************************/ BYTE Mnemonic Comments 0xFF Start-of-message indicator. Followed by local message ID# as ASCII string, null-terminated as always. This byte is the >only< byte which has the high bit set in a Citadel message.buf file. This field must be present in every message. A Author Name of originator of message. D Date Date message was created. M Message Text of message. Is last field in a message, by definition. Following data will be ignored. This field must be present in every message. N Name Human name for node originated on. Used on title line of foreign messages. Ex: ODD-DATA will produce a title message something like 82Nov23 from Cynbe ru Taren @ODD-DATA O Origin ID of node message originated on: Country code plus phone number of system. (Not stored for locally originated messages.) Ex: US 206 633 3282 R Room Room of origin. Topic. S Source ID# Message ID # on system message was created on. Two 16-bit integers (high and low halves of full 32-bit ID#) separated by a blank. Ex: 0 13654 T To Addressee. Used only for private messages in Mail>, in version 2.00 . EXAMPLE Let <FF> be a 0xFF byte, and <0> be a null (0x00) byte. Then a message which prints as LOGLAN> read new 82Nov04 From James Cooke Brown Loi uiue i Ti logla LOGLAN> might be stored as <FF>0 3583<0>D82Nov04<0>AJames Cooke Brown<0>RLOGLAN<0>MLoi uiue i Ti logla<0> |--Local ID--|---Date---|-----Author---------|--Room---|-------Message--------| The date, room and author fields could be in any order. Not all fields are printed by default. The local ID# and Room field are suppressed here. An isolated system will not normally have use for fields beyond those used in this example. Lines are marked with C NewLine (ASCII LF) characters, within the message text proper. /************************************************************************/ /* Networking */ /************************************************************************/ Citadel nodes network by sharing one or more rooms. Any Citadel node can choose to keep an image of any public room on any other Citadel node (subject to willingness to foot the phone bills, of course!). The procedure in essence simply involves calling the imaged node up periodically and copying any new messages in the imaged room into the local image. There is no necessary reciprocity or pre-arrangement, although convenience and politeness respectively suggest both. The node which gets the information foots the phone bill for the transaction. This promotes simple and harmonious relations between the nodes. Complexities arise primarily from the possibility of densely connected networks: one does not wish to accumulate multiple copies of a given message, which can easily happen. Nor does one want to see old messages percolating indefinitely through the system. This problem is handled by a simple brute-force mechanism: each node keeps a list of all messages it has seen recently, recording origin system, creation date, and original ID#. When downloading, messages which have already been seen, or which are too old to be remembered, are skipped. Messages can percolate outward through a large network with no global routing or control, but do not reproduce wildly or cycle indefinitely. The above discussion should make the function of the new fields reasonably clear: o Every message needs a local ID#, so the system can determine if a given caller has seen it before. o Travelling messages need to carry system of origin, date of origin, and original ID# with them, to keep reproduction and cycling under control. (Uncoincidentally) the format used to transmit messages for networking purposes is precisely that used on disk, except that lines are marked with ASCII CR characters in stead of ASCII LF characters. The current distribution does not include active network support code; these comments are included in the hope of discouraging people from modifying the disk format, date field or whatever in ways which will make their local node incompatable with the network at large. /************************************************************************/ /* portability problems */ /************************************************************************/ Modem code is covered elsewhere. If porting to a C compiler other than BDS C, various problems are likely where low-level I/O takes place. Look in particular at: bios() direct BIOS calls -- console access mostly. getc() getw() inp() input from Z80 port. Mostly modem-related. movmem() block transfer in RAM outp() output to Z80 port. Mostly modem-related. putch() console output rsrec() set random-access mode for file rread() random file read by sectors rwrite() random file write by sectors _spr() mprintf(), dprintf() use this printf() support routine. /************************************************************************/ /* "Room" records (ctdlRoom.sys) */ /************************************************************************/ The rooms are basically indices into ctdlMsg.sys, the message file. As noted in the overview, each is essentially an array of pointers into the message file. The pointers consist of a 16-bit message ID number (we will wrap around at 64K for these purposes) together with a 16-bit sector offset within ctdlMsg.sys telling us where the message begins. Since messages are number sequentially and written circularly, the set of messages existing in ctdlMsg.sys will always form a continuous sequence at any given time. Thus, by remembering the ID numbers of the oldest and newest messages in the message file, we can check to see if a message exists >before< going to disk, saving ourselves (and the disk drive) the pain of futile seeks in search of the nonexistent. This information is recorded in oldestHi/oldestLo (16-bit halves of 32-bit numbers, of which we use the low half here) and newestHi/newestLo. You'll be seeing more of these... The newestHi/newestLo is simply incremented each time we enter a new message in the message files. OldestHi/oldestLo is incremented each time we overwrite an FF (start-of-message) byte in the course of writing a new message into the files. This corresponds to dead reckoning -- current code never checks to see that the message number of the message we are overwriting is what we think it is. In a garbaged file with extra FF bytes around, this could cause oldestHi/oldestLo to count too rapidly, eventually perhaps overtaking newestHi/newestLo, at which time the system will look completely empty. If you suspect something like this is going on, just use configur.com to rebuild accurate numbers. That should be enough background to tackle a full-scale room. From 200ctdl.h : struct { char rbgen; /* generation number of room */ char rbflags; /* same bits as flags above */ char rbname[NAMESIZE]; /* name of this room */ char rbdisk; /* disk this room's files are in */ char rbuser; /* user area for this room's files */ union { struct { unsigned rbmsgNo; /* every message gets unique# */ int rbmsgLoc; /* sector message starts in */ } msg[MSGSPERRM]; /* we'll ignore the other component here -- not used currently */ } vp; char rbOverRun[2]; /* actually unneeded just now */ } roomBuf; [Note that all components start with "rb" for roomBuf, to make sure we don't accidentally use an offset in the wrong structure. Be very careful also to get a meaningful sequence of components -- roomBuf.vp.msg[i].rbmsgNo; not roomBuf.msg[i].rbmsgNo; and certainly not roomBuf.rbmsgNo; BDS C provides no checking on this sort of stuff either. (Fast compilation on an 8080 has its prices!)] Rbgen handles the problem of rooms which have died and been reborn under another name. This will be clearer when we get to the log file. For now, just note that each room has a generation number which is bumped by one each time it is recycled. Rbflags is just a bag of bits recording the status of the room. The defined bits are: Bit 0: INUSE. 1 if the room is valid, 0 if it is free for re-assignment. Bit 1: PUBLIC. 1 if the room is visible by default, else 0. Bit 2: CPMDIR. 1 if the room is a window onto some disk/userspace, else 0. Bit 3: PERMROOM. 1 if the room should not be recycled even if empty, else 0. (Lobby>, Mail> and all CPMDIRs are automatically permanent.) Rbname is just an ASCII string (null-terminated, like all strings) giving the name of the room. Rbdisk and rbuser are meaningful only in CPMDIR rooms, in which case they give the disk (0 == A:, 1==B: ... 15==P:) and userspace (0..31) to window. Finally, msg is the array of pointers into the message file. RbmsgNo is the ID number of the message, and RbmsgLoc is the sector it begins in. (For NIL, we stick the value -1 in RbmsgLoc.) RbOverRun[] at the tail end is a little kludge. We read disk sectors directly into roomBuf, so it has to be two sectors long or something following will be overwritten. If you resize msg[], be sure to recalculate the length of roomBuf and resize rbOverRun[] accordingly. RoomTab is just a little index into ctdlRoom.sys, to keep us from constantly pounding around on the disk looking for things. It basically records the name and status of each room. It is 100% redundant with the file itself... as all our indices are. (As all indices should be!) Note that RoomTab is a significant consumer of RAM all by itself. It is RAM well spent, but if you have to shave Citadel a few K to make it fit your system, cutting the number of rooms a bit is one try. The only field new to us in roomTab is rtlastMessage, recording the most recent message in the room. When we are searching for rooms with messages a given caller hasn't seen, we can check this number in RAM and avoid unprofitable disk accesses. /************************************************************************/ /* log records (ctdlLog.sys) */ /************************************************************************/ This is the fun one. Get some fresh air and plug in your thinking cap first. (Time, space and complexity are the eternal software rivals. We've got 256 log entries x about 500 messages spread over up to 128 rooms to worry about, and with floppies disk access time is important... so perforce, we opt for lots of complexity to keep time and space in bounds.) To understand what is happening in the log code takes a little persistence. You also have to disentangle the different activities going on and tackle them one by one. o We want to remember some random things such as terminal screen size, and automatically set them up for each caller at login. o We want to be able to locate all new messages, and only new messages, efficiently. Messages should stay new even if it takes a caller a couple of calls to get around to them. o We want to remember which private rooms a given caller knows about, and treat them as normal rooms. This means mostly automatically seeking out those with new messages. (Obviously, we >don't< want to do this for unknown private rooms!) This has to be secure against the periodic recycling of rooms between calls. o We want to support private mail to a caller. o We want to provide some protection of this information (via passwords at login) and some assurance that messages are from who they purport to be from (within the system -- one shouldn't be able to forge messages from established users). Lifting another page from 200ctdl.h gives us: struct logBuffer { char lbnulls; /* # nulls to print after newline */ char lbflags; /* UCMASK, LFMASK, EXPERT */ char lbwidth; /* terminal width */ char lbname[NAMESIZE]; /* caller's name */ char lbpw[NAMESIZE]; /* caller's password */ char lbgen[MAXROOMS]; /* 5 bits gen, 3 bits lastVisit */ char lbvisit[MAXVISIT]; /* newestLo on last few calls */ char lbslot[MAILSLOTS]; /* for private mail */ char lbId[MAILSLOTS]; /* for private mail */ char lbOverFlow[] /* */ } Looks simple enough, doesn't it? One topic at a time: RANDOM CONFIGURATION PARAMETERS These are in the first three bytes in the record. Lbnulls gives the number of nulls to print after a newline, for folks who like simultaneous hardcopy. Or any remaining ASR33 aficionados calling up... Lbwidth is the caller's screen width. We format all messages to this width, as best we can. Lbflags is another bit-bag, recording uppercase-only folks, people who need a linefeed after a carraige-return, and people who want to suppress the little automatic hints all through the system. FINDING NEW MESSAGES This is the most important. Thus, it winds up being the most elaborate. Conceptually, what we would like to do is mark each message with a bit after our caller has read it, so we can avoid printing it out again next call. Unfortunately, with 256 log entries this would require adding two sectors to each message... and we'd wind up reading off disk lots of messages which would never get printed. So we resort to an arcane mixture of approximation and low animal cunning. The approximation comes in doing things at the granularity of rooms rather than messages. Messages in a given room are "new" until we visit it, and "old" after we leave the room... whether we read any of them or not. This can actually be defended: anyone who passes through a room without reading the contents probably just isn't interested in the topic, and would just as soon not be dragged back every visit and forced to read them. Given that messages are numbered sequentially, we can simply record the most recent message ID# of each room as of the last time we visited it. With 128 rooms, this would give us (for each user) an array of 128 integers, or 256 bytes. This is still too much -- I'd like the complete log record for a user to be 256 bytes or less, and we have other stuff to do yet. So, we complicate a little more. We record in lbvisit[MAXVISIT] the most recent message ID# in the system on each of the last six calls or so. Now, for each room, we can just indicate which call we last visited the room on. This takes 3 bits per room, which we stash in the low three bits of lbgen[MAXROOMS]. Now we're down to 128 rooms x 3 bits (plus a few bytes in lbvisit[], of course), which is quite reasonable. Putting it all together, we can now compute whether a given room has new messages for our current caller without going to disk at all: > We get the lbgen[] entry for the room in question > We mask off the lower 3 bits > We use the result as an index into lbvisit[], getting the ID number of the most recent message in the system as of the last time we visited the room. > We compare this with roomTab[].rtlastMessage, which tells us what the most recent message in the room is currently. REMEMBERING WHICH PRIVATE ROOMS TO VISIT This looks trivial at first glance -- just record one bit per room per caller in the log records. The problem is that rooms get recycled periodically, and we'd rather not run through 256 log entries each time we do it. So we adopt a kludge which should work 99% of the time. As previously noted, each room has a generation number, which is bumped by one each time it is recycled. As not noted, this generation number runs from 0 -> 31 (and then wraps around and starts over). Thus, these numbers take 5 bits to represent. By a miraculous coincidence, we have exactly 5 bits left in the lbgen[] entries in the log records. [Anyone familiar with "capability" pointers will be encountering deja vu here...] When someone visits a room, we set the generation number in lbgen[] equal to that of the room. This flags the room as being available. If the room gets recycled, on our next visit the two generation numbers will no longer match, and the room will no longer be available -- just the result we're looking for. (Naturally, if a room is marked as PUBLIC, all this stuff is irrelevant.) This leaves only the problem of an accidental matchup between the two numbers giving someone access to a Forbidden Room. We can't eliminate this danger completely, but it can be reduced to insignificance for most purposes. (Just don't bet megabucks on the security of this system!) Each time someone logs in, we set all "wrong" generation numbers to be one less than the actual generation of the room. This means that the room must be recycled thirty-one times before an accidental matchup can be achieved. (We do this for all rooms, INUSE or dead, public or private, since any of them may be reincarnated as a Forbidden Room.) Thus, for someone to accidentally be lead to a Forbidden Room, they must establish an account on the system, then not call until some room has been recycle thirty-one to thirty-two times, which room must be reincarnated as a Forbidden Room, which someone must now call back (having not scrolled off the userlog in the mean time) and read new messages. The last clause is about the only probable one in the sequence. The danger of this is much less than the danger that someone will simply guess the name of the room outright... SUPPORTING PRIVATE MAIL Can one have an elegant kludge? This must come pretty close. Private mail is sent and recieved in the Mail> room, which otherwise behaves pretty much as any other room. To make this work, we store the actual message pointers in lbslot[] and lbId[] in the caller's log record, and then copy them into the Mail> room array whenever we enter the room. This requires a little fiddling to get things just right. We have to update roomTab[MAILROOM].rtlastMessage at login to reflect the presence or absence of new messages, for example. And MakeMessage() has to be kludged to ask for the name of the recipient of the message whenever a message is entered in Mail>. But basically it works pretty well, keeping the code and user interface simple and regular. PASSWORDS AND NAME VALIDATION LogTab[] indexes ctdlLog.sys, giving us a quick way of finding people. It is basically a chronologically sorted hash table. We keep a two-byte hash of the name and password of each caller in RAM. When someone tries to log in, we just whip through the table in order looking for matches on the password hash and loading the matching logfile entry in. Bogus hits are eliminated by the simple expedient of refusing to acknowledge a new user who's name or password hashes on top of an existing user. Computer chauvinism at it's best... This makes it difficult to forge messages from an existing user. (Fine point: nonprinting characters are converted to printing characters, and leading, trailing, and double blanks are deleted.)